image border bottom

Title & Author

Page Full Text

Presentation of a Novel Technique for Data Quality Improvement of Commercial Proxy Log for the Use of Efficient web mining

1-7Full Text

Akbar Keshavarzpour*, Kimia Bazargan Lari and Haleh Homayouni

Abstract
Data Cleaning is a stage taken in the preprocessing of Web Mining, and is widely used in most Data Mining systems. Although many efforts have been made for data clearing of Web Server Logs, but there are some questions and ambiguities yet unanswered about Enterprise Proxy. With limited access to web sites, Enterprise Proxies trace web request from various clients to various web servers, which differentiates them from Web Server Logs in both location and content; therefore, most Irrelevant items like Software update requests cannot be filtered by the traditional methods of Data Cleaning. In this article we initially propose a method named EPLogCleaner that can filter out large number of irrelevant items based on the common prefix of their URLs, in this regard we do an evaluation on EPLogCleaner with a real network traffic trace acquired from an enterprise proxy. The experimental results show that our proposed method (EPLogCleaner) can improve the data quality of Enterprise Proxy logs by filtering over 30% of URL requests of them, comparing with the traditional data cleaning methods.